Similarity searching in the CORDIS text database
نویسندگان
چکیده
Similarity searching in text databases with multiple field types is still an open problem. We focus our attention on the “COmmunity Research and Development Information Service” (CORDIS) database of the European Union and we evaluate the effectiveness of many text retrieval methods in terms of precision, recall and ranking quality. Our experiments indicate that different field types should be handled by different retrieval methods.
منابع مشابه
Similarity Searching in Text Databases with Multiple Field Types
Similarity searching in text databases with multiple field types is still an open problem. We experimented with CORDIS and we evaluated the effectiveness of many text retrieval methods in terms of precision, recall and ranking quality.
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملThe Protein Information Resource (PIR)
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text sea...
متن کاملWebCSD: the online portal to the Cambridge Structural Database
WebCSD, a new web-based application developed by the Cambridge Crystallographic Data Centre, offers fast searching of the Cambridge Structural Database using only a standard internet browser. Search facilities include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching. Text, chemical diagrams and three-dimensional structural information can all be studie...
متن کاملDatabase Schema Matching using Corpus-based Semantic Similarity and Word Segmentation
In this paper, we present a new method for database schema matching, the problem of identifying elements of two given schemas that correspond to each other. We use two methods based on a large text corpus: one method for determining the semantic similarity of two target words and the other for automatic word segmentation. We present a name-based element-level database schema matching method tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Softw., Pract. Exper.
دوره 30 شماره
صفحات -
تاریخ انتشار 2000